Including dynamic and phonetic information in voice conversion systems

نویسندگان

Antonio Bonafonte

Alexander Kain

Jan P. H. van Santen

Helenca Duxans

چکیده

Voice Conversion (VC) systems modify a speaker voice (source speaker) to be perceived as if another speaker (target speaker) had uttered it. Previous published VC approaches using Gaussian Mixture Models [1] performs the conversion in a frame-by-frame basis using only spectral information. In this paper, two new approaches are studied in order to extend the GMM-based VC systems. First, dynamic information is used to build the speaker acoustic model. So, the transformation is carried out according to sequences of frames. Then, phonetic information is introduced in the training of the VC system. Objective and perceptual results compare the performance of the proposed systems.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Using Context-based Statistical Models to Promote the Quality of Voice Conversion Systems

This article aims to examine methods of optimizing GMM-based voice conversion systems performance in which GMM method is introduced as the basic method for improvement of voice conversion systems performance. In the current methods, due to using a single conversion function to convert all speech units and subsequent spectral smoothing arising from statistical averaging, we will observe quality ...

متن کامل

A phonetic assessment of cross-language voice conversion

Cross-language voice conversion maps the speech of speaker S1 in language L1 to the voice of speaker S2 using knowledge only of how S2 speaks a different language L2. This mapping is usually performed using speech material from S1 and S2 that has been deemed “equivalent” in either acoustic or phonetic terms. This study investigates the issue of equivalence in more detail, and contrasts the perf...

متن کامل

A phonetic alternative to cross-language voice conversion in a text-dependent context: evaluation of speaker identity

Spoken language conversion (SLC) aims to generate utterances in the voice of a speaker but in a language unknown to them, using speech synthesis systems and speech processing techniques. Previous approaches to SLC have been based on cross-language voice conversion (VC), which has underlying assumptions that ignore phonetic and phonological differences between languages, leading to a reduction i...

متن کامل

Phoneme-Discriminative Features for Dysarthric Speech Conversion

We present in this paper a Voice Conversion (VC) method for a person with dysarthria resulting from athetoid cerebral palsy. VC is being widely researched in the field of speech processing because of increased interest in using such processing in applications such as personalized Text-To-Speech systems. A Gaussian Mixture Model (GMM)-based VC method has been widely researched and Partial Least ...

متن کامل

Generative Acoustic-Phonemic-Speaker Model Based on Three-Way Restricted Boltzmann Machine

In this paper, we argue the way of modeling speech signals based on three-way restricted Boltzmann machine (3WRBM) for separating phonetic-related information and speaker-related information from an observed signal automatically. The proposed model is an energy-based probabilistic model that includes three-way potentials of three variables: acoustic features, latent phonetic features, and speak...

متن کامل

ذخیره در منابع من

ذخیره در منابع من قبلا به منابع من ذحیره شده

{@ msg_add @}

با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره شماره

صفحات -

تاریخ انتشار 2004

Including dynamic and phonetic information in voice conversion systems

نویسندگان

چکیده

منابع مشابه

Using Context-based Statistical Models to Promote the Quality of Voice Conversion Systems

A phonetic assessment of cross-language voice conversion

A phonetic alternative to cross-language voice conversion in a text-dependent context: evaluation of speaker identity

Phoneme-Discriminative Features for Dysarthric Speech Conversion

Generative Acoustic-Phonemic-Speaker Model Based on Three-Way Restricted Boltzmann Machine

عنوان ژورنال:

اشتراک گذاری